86 research outputs found

    Scalable many-core algorithms for tridiagonal solvers

    Get PDF
    We present a novel distributed memory Tridiagonal solver library, targeting large-scale systems based on modern multi-core and many-core processor architectures. The library uses methods based on both approximate and exact algorithms. Performance comparisons with the state-of-the-art, using both a large Cray EX system and a GPU cluster show the algorithmic trade-offs required at increasing machine scale to achieve good performance, particularly considering the advent of exascale systems

    Batch solution of small PDEs with the OPS DSL

    Get PDF
    In this paper we discuss the challenges and optimisations opportunities when solving a large number of small, equally sized discretised PDEs on regular grids. We present an extension of the OPS (Oxford Parallel library for Structured meshes) embedded Domain Specific Language, and show how support can be added for solving multiple systems, and how OPS makes it easy to deploy a variety of transformations and optimisations. The new capabilities in OPS allow to automatically apply data structure transformations, as well as execution schedule transformations to deliver high performance on a variety of hardware platforms. We evaluate our work on an industrially representative finance simulation on Intel CPUs, as well as NVIDIA GPUs

    Scalable many-core algorithms for tridiagonal solvers

    Get PDF
    We present a novel distributed memory Tridiagonal solver library, targeting large-scale systems based on modern multi-core and many-core processor architectures. The library uses methods based on both approximate and exact algorithms. Performance comparisons with the state-of-the-art, using both a large Cray EX system and a GPU cluster show the algorithmic trade-offs required at increasing machine scale to achieve good performance, particularly considering the advent of exascale systems

    The BioGRID Interaction Database: 2011 update

    Get PDF
    The Biological General Repository for Interaction Datasets (BioGRID) is a public database that archives and disseminates genetic and protein interaction data from model organisms and humans (http://www.thebiogrid.org). BioGRID currently holds 347 966 interactions (170 162 genetic, 177 804 protein) curated from both high-throughput data sets and individual focused studies, as derived from over 23 000 publications in the primary literature. Complete coverage of the entire literature is maintained for budding yeast (Saccharomyces cerevisiae), fission yeast (Schizosaccharomyces pombe) and thale cress (Arabidopsis thaliana), and efforts to expand curation across multiple metazoan species are underway. The BioGRID houses 48 831 human protein interactions that have been curated from 10 247 publications. Current curation drives are focused on particular areas of biology to enable insights into conserved networks and pathways that are relevant to human health. The BioGRID 3.0 web interface contains new search and display features that enable rapid queries across multiple data types and sources. An automated Interaction Management System (IMS) is used to prioritize, coordinate and track curation across international sites and projects. BioGRID provides interaction data to several model organism databases, resources such as Entrez-Gene and other interaction meta-databases. The entire BioGRID 3.0 data collection may be downloaded in multiple file formats, including PSI MI XML. Source code for BioGRID 3.0 is freely available without any restrictions

    Large-scale performance of a DSL-based multi-block structured-mesh application for Direct Numerical Simulation

    Get PDF
    SBLI (Shock-wave/Boundary-layer Interaction) is a large-scale Computational Fluid Dynamics (CFD) application, developed over 20 years at the University of Southampton and extensively used within the UK Turbulence Consortium. It is capable of performing Direct Numerical Simulations (DNS) or Large Eddy Simulation (LES) of shock-wave/boundary-layer interaction problems over highly detailed multi-block structured mesh geometries. SBLI presents major challenges in data organization and movement that need to be overcome for continued high performance on emerging massively parallel hardware platforms. In this paper we present research in achieving this goal through the OPS embedded domain-specific language. OPS targets the domain of multi-block structured mesh applications. It provides an API embedded in C/C++ and Fortran and makes use of automatic code generation and compilation to produce executables capable of running on a range of parallel hardware systems. The core functionality of SBLI is captured using a new framework called OpenSBLI which enables a developer to declare the partial differential equations using Einstein notation and then automatically carryout discretization and generation of OPS (C/C++) API code. OPS is then used to automatically generate a wide range of parallel implementations. Using this multi-layered abstractions approach we demonstrate how new opportunities for further optimizations can be gained, such as fine-tuning the computation intensity and reducing data movement and apply them automatically. Performance results demonstrate there is no performance loss due to the high-level development strategy with OPS and OpenSBLI, with performance matching or exceeding the hand-tuned original code on all CPU nodes tested. The data movement optimizations provide over 3× speedups on CPU nodes, while GPUs provide 5× speedups over the best performing CPU node. The OPS generated parallel code also demonstrates excellent scalability on nearly 100K cores on a Cray XC30 (ARCHER at EPCC) and on over 4K GPUs on a CrayXK7 (Titan at ORNL)

    A human MAP kinase interactome.

    Get PDF
    Mitogen-activated protein kinase (MAPK) pathways form the backbone of signal transduction in the mammalian cell. Here we applied a systematic experimental and computational approach to map 2,269 interactions between human MAPK-related proteins and other cellular machinery and to assemble these data into functional modules. Multiple lines of evidence including conservation with yeast supported a core network of 641 interactions. Using small interfering RNA knockdowns, we observed that approximately one-third of MAPK-interacting proteins modulated MAPK-mediated signaling. We uncovered the Na-H exchanger NHE1 as a potential MAPK scaffold, found links between HSP90 chaperones and MAPK pathways and identified MUC12 as the human analog to the yeast signaling mucin Msb2. This study makes available a large resource of MAPK interactions and clone libraries, and it illustrates a methodology for probing signaling networks based on functional refinement of experimentally derived protein-interaction maps

    Reuse of structural domain–domain interactions in protein networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein interactions are thought to be largely mediated by interactions between structural domains. Databases such as <it>i</it>Pfam relate interactions in protein structures to known domain families. Here, we investigate how the domain interactions from the <it>i</it>Pfam database are distributed in protein interactions taken from the HPRD, MPact, BioGRID, DIP and IntAct databases.</p> <p>Results</p> <p>We find that known structural domain interactions can only explain a subset of 4–19% of the available protein interactions, nevertheless this fraction is still significantly bigger than expected by chance. There is a correlation between the frequency of a domain interaction and the connectivity of the proteins it occurs in. Furthermore, a large proportion of protein interactions can be attributed to a small number of domain interactions. We conclude that many, but not all, domain interactions constitute reusable modules of molecular recognition. A substantial proportion of domain interactions are conserved between <it>E. coli</it>, <it>S. cerevisiae </it>and <it>H. sapiens</it>. These domains are related to essential cellular functions, suggesting that many domain interactions were already present in the last universal common ancestor.</p> <p>Conclusion</p> <p>Our results support the concept of domain interactions as reusable, conserved building blocks of protein interactions, but also highlight the limitations currently imposed by the small number of available protein structures.</p

    Identifying protein complexes directly from high-throughput TAP data with Markov random fields

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting protein complexes from experimental data remains a challenge due to limited resolution and stochastic errors of high-throughput methods. Current algorithms to reconstruct the complexes typically rely on a two-step process. First, they construct an interaction graph from the data, predominantly using heuristics, and subsequently cluster its vertices to identify protein complexes.</p> <p>Results</p> <p>We propose a model-based identification of protein complexes directly from the experimental observations. Our model of protein complexes based on Markov random fields explicitly incorporates false negative and false positive errors and exhibits a high robustness to noise. A model-based quality score for the resulting clusters allows us to identify reliable predictions in the complete data set. Comparisons with prior work on reference data sets shows favorable results, particularly for larger unfiltered data sets. Additional information on predictions, including the source code under the GNU Public License can be found at http://algorithmics.molgen.mpg.de/Static/Supplements/ProteinComplexes.</p> <p>Conclusion</p> <p>We can identify complexes in the data obtained from high-throughput experiments without prior elimination of proteins or weak interactions. The few parameters of our model, which does not rely on heuristics, can be estimated using maximum likelihood without a reference data set. This is particularly important for protein complex studies in organisms that do not have an established reference frame of known protein complexes.</p

    DroID: the Drosophila Interactions Database, a comprehensive resource for annotated gene and protein interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Charting the interactions among genes and among their protein products is essential for understanding biological systems. A flood of interaction data is emerging from high throughput technologies, computational approaches, and literature mining methods. Quick and efficient access to this data has become a critical issue for biologists. Several excellent multi-organism databases for gene and protein interactions are available, yet most of these have understandable difficulty maintaining comprehensive information for any one organism. No single database, for example, includes all available interactions, integrated gene expression data, and comprehensive and searchable gene information for the important model organism, <it>Drosophila melanogaster</it>.</p> <p>Description</p> <p>DroID, the <it>Drosophila </it>Interactions Database, is a comprehensive interactions database designed specifically for <it>Drosophila</it>. DroID houses published physical protein interactions, genetic interactions, and computationally predicted interactions, including interologs based on data for other model organisms and humans. All interactions are annotated with original experimental data and source information. DroID can be searched and filtered based on interaction information or a comprehensive set of gene attributes from Flybase. DroID also contains gene expression and expression correlation data that can be searched and used to filter datasets, for example, to focus a study on sub-networks of co-expressed genes. To address the inherent noise in interaction data, DroID employs an updatable confidence scoring system that assigns a score to each physical interaction based on the likelihood that it represents a biologically significant link.</p> <p>Conclusion</p> <p>DroID is the most comprehensive interactions database available for <it>Drosophila</it>. To facilitate downstream analyses, interactions are annotated with original experimental information, gene expression data, and confidence scores. All data in DroID are freely available and can be searched, explored, and downloaded through three different interfaces, including a text based web site, a Java applet with dynamic graphing capabilities (IM Browser), and a Cytoscape plug-in. DroID is available at <url>http://www.droidb.org</url>.</p

    Generating confidence intervals on biological networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the analysis of networks we frequently require the statistical significance of some network statistic, such as measures of similarity for the properties of interacting nodes. The structure of the network may introduce dependencies among the nodes and it will in general be necessary to account for these dependencies in the statistical analysis. To this end we require some form of Null model of the network: generally rewired replicates of the network are generated which preserve only the degree (number of interactions) of each node. We show that this can fail to capture important features of network structure, and may result in unrealistic significance levels, when potentially confounding additional information is available.</p> <p>Methods</p> <p>We present a new network resampling Null model which takes into account the degree sequence as well as available biological annotations. Using gene ontology information as an illustration we show how this information can be accounted for in the resampling approach, and the impact such information has on the assessment of statistical significance of correlations and motif-abundances in the <it>Saccharomyces cerevisiae </it>protein interaction network. An algorithm, GOcardShuffle, is introduced to allow for the efficient construction of an improved Null model for network data.</p> <p>Results</p> <p>We use the protein interaction network of <it>S. cerevisiae</it>; correlations between the evolutionary rates and expression levels of interacting proteins and their statistical significance were assessed for Null models which condition on different aspects of the available data. The novel GOcardShuffle approach results in a Null model for annotated network data which appears better to describe the properties of real biological networks.</p> <p>Conclusion</p> <p>An improved statistical approach for the statistical analysis of biological network data, which conditions on the available biological information, leads to qualitatively different results compared to approaches which ignore such annotations. In particular we demonstrate the effects of the biological organization of the network can be sufficient to explain the observed similarity of interacting proteins.</p
    corecore